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ABSTRACT 

DNA-binding and modifying proteins show high 
specificity but also exhibit a certain level of promis- 
cuity. Such latent promiscuous activities comprise 
the starting points for new protein functions, but 
this hypothesis presents a paradox: a new activity 
can only evolve if it already exists. How then, do 
novel activities evolve? DNA methyltransferases, 
for example, are highly divergent in their target 
sites, but how transitions toward novel sites occur 
remains unknown. We performed laboratory evolu- 
tion of the DNA methyltransferase M.Haelll. We 
found that new target sites emerged primarily 
through expansion of the original site, GGCC, and 
the subsequent shrinkage of evolved expanded 
sites. Variants evolved for sites that are promiscu- 
ously methylated by M.Haelll [GG( A / T )CC and GGCG 
CC] carried mutations in 'gate-keeper' residues. 
They could thereby methylate novel target sites 
such as GCGC and GGATCC that were neither 
selected for nor present in M.Haelll. These 'general- 
ist' intermediates were further evolved to obtain 
variants with novel target specificities. Our results 
demonstrate the ease by which new DNA-binding 
and modifying specificities evolve and the mechan- 
ism by which they occur at both the protein and DNA 
levels. 

INTRODUCTION 

Much is known about the manner in which proteins 
interact with DNA to accomplish a wide variety of 
cellular functions in a highly specific manner. Our know- 
ledge, however, of how these functions emerged in the 
course of evolution is limited. Several lines of evidence 
indicate that latent, weak, promiscuous functions 
comprise ample starting points for the evolution of new 



protein functions (1-3). Such activities can be dramatically 
improved via few mutations, and they ultimately become 
the main function (4). Indeed, the primary and promiscu- 
ous activities of evolutionary related families often 
overlap, such that the primary activity of one family is 
present as promiscuous, side activity in related families 
(3,5). Promiscuity fails to explain, however, how activities 
that do not pre-exist in a given proteome evolve. 
The acquisition of novel activities may proceed through 
'generalists' intermediates that exhibit exceptionally wide 
ranges of promiscuous activities (6). It remains unknown, 
however, how common are such intermediates; under 
what circumstances and how do they emerge; and 
whether they can lead to novel specificities. 

Here, we examined the evolution of M.Haelll, a DNA 
methyltransferase that specifically targets GGCC sites, 
toward sites that are not recognized to any measurable 
degree by the wild- type enzyme. M.Haelll belongs to 
the prokaryotic restriction-methylation system that 
includes hundreds of different enzyme families, each 
with different target DNA specificity. Their catalytic 
domains mediate the methyl transfer from the co-factor 
S-adenosylmethionine (SAM), and are relatively con- 
served (7,8). However, their target recognition domains 
(TRDs) diverged such that the evolutionary trajectories 
that led to the different methylation specificities we see 
today, cannot be inferred. For example, the TRDs of 
methyltransferases that target GC-contacting sites such 
as GGCC (M.Haelll) and GCGC (M.Hhal), although 
showing structural resemblance (9), show only 26% 
sequence identity with more than 26 positional gaps due 
to insertions and/or deletions (Supplementary Figure SI). 
Further, as shown below, M.Haelll shows no promiscu- 
ous methylation of GCGC sites, nor does M.Hhal with 
GGCC sites. The sequences and functions of these 
enzymes are therefore highly diverged, and the evolution- 
ary trajectories that may connect these enzymes or, in fact, 
other DNA methyltransferase families, remain unknown. 

Modifying the target specificities of DNA methyl- 
transferases in the laboratory has also proven challenging. 
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Enzymes with relaxed specificities were obtained by 
laboratory evolution (10). Increases in promiscuous 
'star' activities (methylation of sites that differ from the 
original target site by one base) were also reported, but 
these resulted in enzymes that methylate non-palindromic 
sites (1 1-13). Here, we pursued a complete switch in target 
specificity: M.Haelll was evolved to variants that methy- 
late novel palindromic specificities such as GCGC and 
barely recognize M.HaellFs original target site. These 
non-overlapping specificities became accessible via 'gener- 
alist' intermediates that emerged under selection for 
M.HaellFs promiscuous methylation activities. 

MATERIALS AND METHODS 

Detailed experimental protocols are provided as 
Supplementary Data, available online. 

Plasmids and strains 

The M.Haelll open reading frame (Supplementary 
Figure S3C) was cloned with a modified N-terminal 
His- tagged into pASK-IBA3+vector (IB A, Ampicillin re- 
sistance). The plasmid was modified to introduce the dif- 
ferent methylation/restriction sites used for the selection 
of the different specificities (Supplementary Figure S3). 
Plasmids were transformed into Escherichia coli strain 
ER2267 (EcoK r- m- McrA- McrBC-Mrr-), or MCI 061 
[mcrAO relAlmcrBl hsdR2(r"m + ), plus pGro7, Takara, 
chloramphenicol resistance], in which DNA methylation 
is not toxic (14). Transformants were selected by growth in 
the presence of ampicillin or both ampicillin and chloram- 
phenicol accordingly. 

Stabilization by consensus mutations 

Orthologous sequences to M.Haelll wild- type sequence 
were collected using Basic Focal Alignment Search Tool 
for Proteins (BFASTP) within the rebase database (15). 
About 55 non-redundant family members (identity 
<95%), homologs to M.Haelll were aligned using 
Multiple Sequence Comparison by Fog-Expectation 
(MUSCFE) (16). (Supplementary Figure S2). Eight pos- 
itions in which M.Haelll deviates from the most probable 
amino acid in a given position were identified and ex- 
changes into the consensus amino acids were individually 
examined. Beneficial mutations were identified by higher 
methyltransferase activity in crude lysates (assayed with a 
DIG-biotin DNA substrate) (17) and higher soluble 
enzyme fraction indicated by SDS-PAGE. Four muta- 
tions with the largest stabilizing effects were introduced 
into wild-type M.Haelll (C26A, I104K, M115F and 
F181F) to give the starting point for directed evolution 
(Supplementary Figure S2). 

Library construction and selection 

Random mutagenesis was performed by PCR using an 
error-prone polymerase (GeneMorph Mutazyme, 
Stratagene) and as template the stabilized M.Haelll 
gene or the pool of M.Haelll variants from the previous 
round and primers that flank the methylase's open reading 
frame. The protocol was optimized to an average at 



2.2 ± 1.6 mutations per gene. The mutated M.Haelll 
genes were recloned into the modified pASK plasmid 
and transformed into E. coli MCI 061 or ER2267. A 
strategy developed for cloning DNA methyltransferases 
(18) was adopted for selection (Figure 2A). Each round 
of evolution, or generation (noted as 'G'), included three 
cycles of enrichment (transformation, growth, plasmid 
extraction and digestion with suitable restriction enzyme 
for the desired target site, see Supplementary Methods 
and Figure 2A for details). In each round, the methyl- 
transferase expression levels were gradually reduced, 
starting from over-expression (0.2jig/ml anhydrotetra- 
cycline inducer, with GroEF/ES over-expression, 0.05% 
arabinose) down to basal expression (no anhydrote- 
tracycline). At the end of each round, 8-12 randomly 
chosen clones were isolated, sequenced and their in vivo 
activities were determined. 

Negative selections 

Enrichment for variants with reduced GGCC methylation 
activity was performed by digestion of the plasmid pool 
with NotI, located downstream the methyltransferase 
gene. The NotI site contains an Haelll site 
(GCGGCCGC). Unmethylated NotI sites were digested 
thereby enabling ligation of a suitable linker 
(Supplementary Figure S4A). Variants that accommodated 
the linker were selectively amplified by PCR using a reverse 
primer that annealed to the ligated linker and a forward 
primer that annealed upstream to the methyltransferase 
gene. Negative selections were applied as specified in 
Figure 2B. 

Methylation assays 

Plasmid DNA was extracted (from library pools or individ- 
ual variants), treated with the restriction enzymes 
(10-20 U, 2h at 37°C) and analyzed by gel electrophoresis. 
The same procedure was applied with genomic DNA 
extracted with the Sigma kit. DNA substrates for in vitro 
assays were prepared by primer extension (26-nt forward 
templates carrying a single restriction/methylation site, 
12-nt biotinylated reverse primers, exo" Klenow fragment 
polymerase, NEB, 1 h, 37°C). The double stranded DNA 
products were analyzed on 4% agarose type XI gels 
(Sigma). A list of all DNA substrates is available in 
Supplementary Methods. 'Time-dependent methylation 
assays' were performed as described previously (19), 
using H 3 -labeled SAM (~0.3 uM) and different enzyme 
and DNA substrate concentrations (10-700 nM; 
Supplementary Figure S5B). Aliquots taken at different 
time points were quenched and transferred to 
streptavidin-coated ScintiPlate wells (PerkinElmer) and 
H 3 levels were determined using the Wallac MicroBeta 
TriFux counter (PerkinElmer). K M DNA and v max values 
were derived by fitting initial rates to the Michaelis- 
Menten model using GraphPad Prism. Error ranges 
relate to the standard errors observed in two or more inde- 
pendent experiments. 'End-point assays' aimed at detecting 
weak and promiscuous activities (Figures 1 and 3), were 
performed as above but at saturating, rather than initial 
rate conditions, namely, using high enzyme concentration 
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Figure 1. Methylation of different target sites by wild-type M.Haelll 
for detection of promiscuous activities. (A) End-point methylation 
activity assay of wild-type M.Haelll of the original GGCC sequence, 
of promiscuous non-palindromic 'star sites' (AGCC, GGCT and 
GGCG) (11), and of the newly identified expanded palindromic sites: 
GG A / T CC (M.Avall-like) and GGCGCC (M.Narl-like). These palin- 
dromic sites show similar specificity as the original site as indicated by 
the lower methylation of related 'star' sites (controls sites) ([E] 0 = 2 uM; 
[DNA substrate] = 0.67 uM; [3H-SAM] = 0.2 uM; 20% glycerol; 5-h 
incubation time at 37°C). (B) Plasmid protection assay: The encoding 
plasmid of wild-type M.Haelll was transformed to E. coli and was 
expressed without induction. The plasmid was subsequently extracted 
and treated with different restriction enzymes as noted. (C) Rates of 
H 3 -methyl incorporation were measured with the original DNA target 
(GGCC, right Y-axis), and with different promiscuous target sites 
(left Y-axis). Insert: the derived initial velocities with the different 
target sequences. ([E\ 0 = 0.2 uM; [DNA substrate] = 0.5 uM; 
[3H-SAM] = 0.2 uM; 20% glycerol at 37°C). UC, uncut plasmid. 

(2-4 jiM) and long incubation times during which favored 
DNA substrates were completely methylated (0.5-5 h). 

Enzyme purification 

Plasmids were transformed to MC1061::pGro7 cells. 
GroEL/ES and methyltransferase over-expression was 
induced with arabinose and anhydrotetracycline inducer 
after which the cell pellets were disrupted by addition of 
lysozyme and sonication. The methyltransferase variants 
were purified by Ni-NTA chromatography (Nickel- 
nitriloacetic acid column, QIAGEN) with the addition of 
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Figure 2. The selection methodology and trajectories for new and 
novel target specificities. (A) Selection by plasmid protection. 
M.HaellFs open reading frame was randomly mutated by error-prone 
PCR. The library is cloned and transformed to E. coli. Within each 
transformed cell, the expressed methyltransferase variant, if active, 
methylated its encoding plasmid and thereby protected it from diges- 
tion with the cognate restriction enzyme (18). Following digestion (e.g. 
with Avail, for the GG A / T CC specificity), the surviving plasmids were 
retransformed, and subjected again to restriction for further enrichment 
of active methylase variants. After two cycles of enrichment (digestion 
and transformation), the plasmid DNA was extracted and the surviving 
M.Haelll genes were amplified and randomly mutagenized (as a pool) 
for the next round. (B) Starting from DNA methyltransferase 
M.Haelll, different trajectories for new (black lines) or novel target 
sites (blue and green lines) were followed. Noted as 'G' (or generation 
numbers) are the rounds of mutation and selection underlining these 
trajectories. The asterisk symbol denotes rounds in which negative se- 
lection against GGCC methylation was applied. SsrA denoted rounds 
in which the SsrA degradation tag was fused to the selected variants to 
increase the selection pressure for higher specific activity. 



1 mM adenosine triphosphate (ATP) to dissociate the chap- 
erons, concentrated and stored at — 80°Cwith 10% glycerol. 



RESULTS 

Promiscuous targets of M.Haelll 

To identify its promiscuous activities, M.Haelll was 
reacted with an array of different DNA target sites using 
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Figure 3. Methylation activities of the evolved variants from the 10th round (G10). Shown are the activities toward the new, evolved sites versus the 
original site, GGCC. The mutations in these variants are listed in Supplementary Table S2. (A) Plasmid protection was assayed under the same 
conditions as Figure IB. All variants show a significant increase in protection of the evolved sites (full protection, in most variants) and partial, or 
even no protection against Haelll digestion. Wild-type (WT) M.Haelll shows only protection against Haelll digestion. Arrows indicate the in vitro 
characterized variants. (B) In vitro methylation activity of purified Round 10 variants: N3 = Narl-selected variant, T2 = Taul-selected variant, 
A4 = Avail-selected variant. ([E] 0 = 2 |iM; [DNA substrate] = 0.67 uM; [3H-SAM] = 0.25 uM; at 37°C). Aliquots of the reaction mixture were 
quenched at different times, and the level of methyl incorporation was determined. (C) End-point activity assay for the same variants. Noted in 
dark gray are the novel activities that were not selected for, and were undetectable in wild-type (A). C* represents a methylated base 
in hemimethylated substrates. ([E] 0 = 4|iM for N3 and A4, 2|iM for T2; [DNA substrate] = 0.67 uM; [3H-SAM] = 0.2 uM; 5h incubation time 
at 37°C). 



a radiolabeled methyl donor 3H-SAM (19). We tested 
known 'star' sites of M.Haelll (11), and palindromic 
sites based on the proposed mechanism of expansion or 
shrinkage of existing target sites (20). Two such sequences 
were clearly methylated: GGCGCC and GG( A / T )CC 
(Figure 1). These palindromic sequences contain the 
original GGCC sequence with an internal insertion ( A / T 
or CG). The 'star' sequences of these targets (e.g. GGTCT 
versus GGTCC; Figure 1) showed much lower methyla- 
tion, supporting the notion of recognition of an extended 
site (20). The newly identified extended sites comprised 
our first targets for directed evolution. 

Increasing M.HaellFs evolvability 

DNA methyltransferases, including M.Haelll, have 
proven somewhat resistant to laboratory evolution 



despite the application of ultra-high-throughput selections 
(12,21). These difficulties may relate to the low stability 
of these enzymes. Indeed, when over-expressed in E. coli, 
M.Haelll, an enzyme isolated from Haemophilus 
aegyptius, tends to aggregate. Low stability dramatically 
reduces the fraction of properly folded and functional 
mutants, and thereby limits the ability to acquire new 
functions (22). We therefore, applied new approaches 
aimed at circumventing the setbacks caused by the 
destabilizing effect of mutations, and by the stability- 
activity trade-offs that underline the evolution of new 
protein functions (22-24). 

One approach to increase M.HaellFs stability and 
evolvability was by introducing ancestor/consensus muta- 
tions and, thereby increase its ability to accept a wider 
range of mutations (23). Accordingly, we identified a com- 
bination of four consensus mutations (C26A, I104K, 
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M115L and F181L) that led to > 6-fold increase in the 
concentration of soluble active enzyme (Supplementary 
Figure S2). Second, the libraries of mutated M.Haelll 
genes were co-expressed with chaperones GroEL/ES to 
buffer the effects of destabilizing mutations (24,25) 
(Supplementary Figure S2D). 

Third, M.Haelll was neutrally drifted to accumulate a 
wide variety of mutations while maintaining its original 
GGCC specificity (26) (Figure 2A). To this end, 
M.HaellFs open reading frame was randomly mutated 
by error-prone PCR at an average of 2.2 ±1.6 mutations 
per gene. The resulting library was cloned into an expres- 
sion vector and transformed to E. coli. Within each trans- 
formed cell, the expressed methyltransferase variant, if 
active, methylated its encoding plasmid and thereby pro- 
tected it from digestion with the cognate restriction 
enzyme (18) (Figure 2). The surviving plasmids were 
retransformed, and subjected again to Haelll restriction 
for further enrichment of active methylase variants. After 
two cycles of enrichment (digestion and transformation), 
the plasmid DNA was extracted, and the surviving 
M.Haelll genes were amplified and randomly mut- 
agenized (as a pool) for the next round. Two such 
rounds of mutagenesis and selection by Haelll digestion 
were performed to give an ensemble of polymorphic 
mutants that were all folded and functional with an 
average of 2.2 ± 1.5 mutations per gene. This neutrally 
drifted ensemble was used as the starting point for the 
selection of M.Haelll variants that efficiently methylate 
target sites other than GGCC. 

Divergence toward M.HaellFs promiscuous activities 

Selections to amplify the methylation of promiscuous 
target sites were performed as the neutral drift. 
However, instead of digesting with the cognate Haelll, 
the plasmid pools of the various libraries were digested 
with restriction enzymes that recognize the target sites, 
such that only variants which methylated the new target 
sites survived [e.g. Avail for the GG( A / T )CC sites; 
Figure 2B and Supplementary Figure S3]. To control the 
selection pressure, the methyltransferase gene libraries 
were cloned under the tet promoter (anhydrotetracycline 
induced expression). The selection plasmids also carried 
the desired methylation-restriction target sites. The 
GroEL/ES, the E. coli chaperonin, was over-expressed 
from a second plasmid (24) (Takara). Once the evolved 
libraries retained a satisfactory level of methylation 
activity (10 5 or more surviving clones), chaperonin 
co-expression was removed, and the methyltransferase ex- 
pression level was reduced to the basal level (no inducer). 
The selection pressure was also augmented by recloning to 
plasmids that carried a larger number of target sites 
(Supplementary Figure S3). 

Following this procedure, the neutrally drifted pool 
(denoted as G2) of M.Haelll was initially evolved 
toward three different specificities (Figure 2B): the newly 
identified promiscuous sites, GGCGCC and GG( A / T )CC, 
comprised our first targets for evolution (digestion with 
Narl and Avail, respectively). As we were primarily inter- 
ested in trajectories that may have actually occurred in 



nature, we chose the GC( G / C )GC as the third target (di- 
gestion with Taul). Phylogenetically, methyltransferases 
with GC( G / C )GC specificity seem to be the closest 
paraologs of M.Haelll, although the sequences of the 
TRDs of these two families are quite diverged (37% 
identity, more than 40 positional gaps; Supplementary 
Figure SI). M.Haelll exhibited some methylation of 
GC( G / C )GC sites, although at barely detectable level 
(< 10-fold, lower than with other promiscuous sites; 
Figure 1A). 

As opposed to the first two target sites, no survivals 
were observed after Taul digestion, most likely due to 
the very weak promiscuous activity toward this site. 
However, after selection with Narl, we could obtain 
Taul-protected plasmids. Thus, the Narl G3 library was 
further mutated, split and selected with either Narl or 
Taul. Four additional rounds of random mutagenesis 
and selection were applied for these three different 
specificities (G4-G7; Figure 2B). By the seventh round, 
the pools of variants from all three trajectories showed 
marked increases in methylation of the target sites they 
were selected for, and some reduction in methylation of 
M.HaelU's original target, GGCC. 

To increase the selectivity of the evolving variants, we 
co-selected for the survival of unmethylated GGCC 
plasmids (Supplementary Figure S4). Negative selection 
(denoted as star in Figure 2B) was applied starting from 
Round 7. However, we found that its main effect had been 
a parallel decrease in methylation of both the new and the 
original sites (as observed in plasmid protection assays in 
Supplementary Figure S4B). Thus, additional rounds of 
selection for the new target sites (positive selections) were 
necessary. 

The emergence of novel activities 

By the end of the 10th round (G10), several randomly 
picked variants from each trajectory were sequenced and 
their methylation activities were assayed (Figure 3 and 
Supplementary Table SI). Most tested variants showed 
complete protection from digestion by the restriction 
enzymes they were selected with, and decrease in the 
original GGCC methylation activity. Representative 
variants from each trajectory were also purified and 
characterized in vitro (Figure 3B). The evolved variants 
showed a significant increase in the rate of methylation 
of the evolved target sites but they were unstable and 
lost most of their activity during purification. The Taul- 
selected variant T2 showed the most dramatic improve- 
ment as methylation of GC( G / C )GC by wild-type 
M.Haelll that could be barely detected (Figure 3B). 

The G10 variants can be considered as bi-functional 
evolutionary intermediates — they methylated both the 
newly evolved target sites and the original one with only 
a mild preference toward the former (Figure 3B). 
However, a wider screen indicated that the G10 variants 
also methylated target sites that were neither selected for, 
nor methylated by wild- type M.Haelll (Figure 3C). Thus, 
these variants behaved as 'generalists' (6). 

Several novel target sites were detected (Figure 3C). 
Most notably, the Taul-selected variant [GC( G / C )GC] 



11632 Nucleic Acids Research, 2012, Vol. 40, No. 22 



could also methylate GCGC, thus exhibiting Hhal-like 
methylation activity. Likewise, the Avail selected 
variant [GG( A / T )CC] methylated GGATCC (BamHI) 
and GGTACC (Kpnl) sites. In fact, the 'generalist' 
trend was already seen in the divergence of Taul methy- 
lation [GC( G / C )GC]. This activity could barely be detected 
in wild-type M.Haelll, and could not be evolved directly. 
However, variants selected for Narl sites (GGCGCC) 
readily diverged to give Taul methylation [GC( G /c)GC]. 

Evolution toward novel target specificities 

The 'generalisf G10 variants were used as the starting 
points for divergence toward novel target sites. 
Additional rounds of mutagenesis and selection were 
applied along two new trajectories (Figure 2B): The 
Taul-selected pool (G9) was now selected with Hhal for 
GCGC methylation; and, the Avail selected pool (G10) 
was selected with BamHI for GGATCC methylation. The 
already pursued trajectory of GG( A / T )CC methylation 
(Avail) was continued in parallel. Within two rounds, 
the selected pools showed a marked increase in plasmid 
protection against digestion with the cognate restriction 
enzymes. However, at this stage, the dynamic range of 
our selection system was exhausted (i.e. the enzyme was 
sufficiently active to protect its encoding plasmid, even at 
basal expression of the methyltransferase variants and 
without GroEL/ES over-expression). However, when 
selected variants were purified, their enzymatic methyla- 
tion activity was found to be relatively low. 

To increase the selection pressure and obtain higher 
catalytic efficiencies, we had to reduce the cellular 
enzyme doses, and thereby enforce the evolved variants 
to increase their specific enzymatic activity. Therefore, 
we fused an 11 -amino acids SsrA degradation tag to the 
methyltransferase's C-terminus. This tag targets the ex- 
pressed enzyme variants for rapid degradation by the 
ClpXP protease (27). Indeed, as opposed to wild-type 
M.Haelll, which fully protects its plasmid against 
Haelll digestion at basal expression, upon fusion of the 
SsrA tag, protection could be observed only under 
over-expression (Supplementary Figure S4C). The pools 
selected for Hhal (G16), Avail (G16) and BamHI (G20, 
Figure 2B) specificities were further mutated and recloned 
to a modified pASK vector that carried the SsrA tag 
(Supplementary Figure S3). The selections were initially 
performed with maximal level of expression (lOOng/ml 
of the anhydrotetracycline inducer), and the inducer's 
levels were subsequently reduced (to 5 ng/ml in the Hhal 
and Avail selections, and to 30 ng/ml in the BamHI selec- 
tion). Under these conditions, the libraries showed meas- 
urable survival after digestion with the cognate restriction 
enzymes (>10%; Supplementary Figure S4D). 

The evolved novel variants 

Following additional rounds of selection with the SsrA tag 
(G20-G22, Figure 2B), several variants were randomly 
picked (Supplementary Table SI). To enable their produc- 
tion at a large scale and purification, the SsrA tag had to be 
removed. However, upon recloning without the SsrA tag, 
the evolved variants exhibited considerable toxicity, and 



viable trasnformants were found to carry additional muta- 
tions. Eventually, after sequencing many transformants, 
the tag was successfully removed from most variants. 
Their methylation activity could be tested in vivo 
(Figure 4), and subsequently with purified enzymes. 
Nonetheless, an inverse correlation was observed between 
the methylation activity of a variant and the growth rate of 
E. coli cells carrying it (Supplementary Figure S4E). 

The tested variants showed high methylation activities 
as detected by the plasmid protection assay of the evolved 
target sites (Figure 4). Unlike the intermediate variants 
from G10, methylation activity of the original GGCC 
target was marginal in some variants and undetectable 
in others. That the G20-G22 variants methylation rates 
are comparable with wild-type was also indicated by their 
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Figure 4. Methylation activities of variants evolved toward new and 
novel target specificities. Shown is a plasmid protection assayed 
under the same conditions as Figure IB (digested in addition to the 
different restriction enzymes as noted, in parallel with Ncol digestion 
for plasmid linearization). Arrows indicate the in vitro characterized 
variants. (A) Randomly picked variants from the last round (G20) of 
selection for methylation of Avail sites [GG( A / T )CC]. All eight variants 
confer complete protection of the Avail sites selected for (i.e. complete 
methylation, left panel) and variable degrees of protection of the 
original sites (e.g. complete protection by A2 versus no protection 
with A7). UC = plasmid treated only with Ncol. M.Haelll grown 
and digested under the same conditions. (B) Seven randomly picked 
Hhal-selected variants (GCGC) from G20. These variants completely 
lost the original activity as indicated by no protection against Haelll 
digestion. (C) Seven randomly picked BamHI-selected variants 
(GGATCC) from G21 showed a marked increase in the protection of 
the evolved target site, as well as a reduction in protection against 
Haelll. WT, wild-type. 
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Table 1. Kinetic parameters for methylation of different DNA target sites by wild-type M.Haelll and its evolved variants. 



Wild-type 



Avail selected 



A3 



BamHI selected 



B5 



Hhal selected 



H3 



H4 



Original target site GGCC 
k cat (min -1 ) 
K M (nM) 

k C at/K M (M ^min l ) 

Fold change a 
Evolved target site GGACC 

k cat (min -1 ) 

K M (nM) 

K M (M^-min" 1 ) 

Fold change a 
Evolved target site GGTACC 

k cat (min -1 ) 

K M (nM) 

K M (M^-min" 1 ) 

Fold change 
Evolved target site GGATCC 



k cat (min" 
K M (nM) 



Fold change a 
Evolved target site GCGC 
k cat (min -1 ) 
K M (nM) 

-^cat/ K M (M^-min" 1 ) 
Fold change a 



0.4 (±0.02) 
58 (±14) 
6.9 x 10 6 



0.03 (±0.0) 
189 (±32) 
1.5 x 10 5 



ND 
ND 

<30 c 



ND 
ND 

<30 c 



ND 
ND 

<30 c 



0.09 (±0.01) 
165 (±43) 
5.3 x 10 5 
0.08 

0.53 (±0.01) 
70 (±6) 
7.6 x 10 6 
50.7 



0.11 (±0.02) 
728 (±211) 
1.5 x 10 5 
0.02 



0.14 (±0.01) 
193 (±38) 
7.3 x 10 5 
>2.4 x 10 4 

0.07 (±0.0) 
116 (±26) 
5.6 x 10 5 
>1.9 x 10 4 



0.02 (±0.0) 
94 (±23) 
1.7 x 10 5 
0.02 



0.1 (±0.0) 
42 (±6) 
2.4 x 10 6 
>8 x 10 4 



>6.6 x 10 2 >1.1 x 10 6 / >8.6 x 10 5 >3.2 x 10 6 



0.02 (±0.0) 
121 (±40) 
1.4 x 10 5 
0.02 



0.12 (±0.01) 
53 (±10) 
2.2 x 10 6 
>7.3 x 10 4 



Specificity shift b 



3.2 x 10 6 



a The ratio of k cat /K M values of the evolved variant versus wild-type M.Haelll. Marked in bold. 

b The overall change in ratio of k cat /K M values for the new or novel evolved target site versus the original GGCC site. Marked in bold. 

c The minimal detectable activity, given the background reads, enzyme concentrations and incubation times applied, corresponds to a k cat /K M value 

of <30M- 1 -min- 1 . 

ND, not detected. 



ability to fully protect the genomes of their host E. coli 
cells, even at basal expression levels (more than 30 000 
Hhal, GCGC sites, e.g. Supplementary Figure S5A). 

Kinetic constants were determined by measuring the 
initial rates of methylation with purified enzymes at dif- 
ferent concentrations of target DNA substrates (Table 1 
and Supplementary Figure S5B). The variant displaying 
the highest catalytic efficiency (k cat /K M ) was the 
M. Avail-like A3 that was selected toward GG( A / T )CC 
sites. Its k cat /K M for the evolved GG( A / T )CC target site 
was 7.6 x 10 min" -'M" 1 (51 -fold increase relative to 
wild-type; and comparable with the wild-type's rate with 
GGCC). A3's rate toward the original GGCC target 
was 5.3 x 10 5 (13-fold lower than wild-type). Overall, it 
exhibited a shift of ~660-fold in specificity. 

Even larger changes in specificity were exhibited by 
Hhal-selected variants H3 and H4, and by the BamHI- 
selected variant B5. Wild-type M.Haelll shows no methy- 
lation of these sites, GCGC and GGATCC, even at the 
highest enzyme concentrations (Figure 1). In these cases, 
>10 4 -fold increases in methylation rates were observed 
because wild-type M.Haelll exhibits no detectable methy- 
lation of these sites (given the background reads, maximal 
enzyme concentrations and incubation times applied, the 
minimal detectable activity corresponds to a k cat /K M 
value of <30min _1 M _1 ). Additionally, the Hhal selected 
variants H3 and H4 wild-type-like catalytic efficiency 
toward the evolved GCGC sequence (k cat /K M of 



~2.3 x 10 6 min _1 M _1 ), and much lower methylation rate 
of the original GGCC site (^45-fold decrease relative to 
wild-type M.Haelll). 

Overall, the activity changes observed in all the evolved 
variants were manifested primarily in higher k cat values, 
and only minor changes in K M were observed. This trend 
is in agreement with the tendency of DNA methyltrans- 
ferases to bind cognate and non-cognate target sites with 
comparable affinities, but methylate them with very differ- 
ent k cat values (28). The one exception was the BamHI- 
selected variant B5. Although selected toward GGATCC 
methylation, the methylation rate of GGTACC was simi- 
lar (7.3 x 10 5 and 5.6 x 10 5 min _1 M _1 , respectively). 
Curiously, the 46-fold reduction in its catalytic efficiency 
with the original GGCC site was achieved mainly through 
an increase in K M . 

To assess the specificity of the newly evolved variants, 
the methylation of a wide range of DNA target sites was 
tested (Supplementary Figure S5C). This was done by an 
end-point assay, similar to the one used for detecting the 
promiscuous activities of M.Haelll and the intermediate 
variants (Figure 1). In all cases, and despite the end-point 
format, a clear preference was observed toward the 
evolved target sites relative to the original GGCC site. 
Further, no cross-reactivity was detected between the 
Hhal and the Avall/BamHI trajectories — i.e. the Hhal- 
selected variants showed no methylation of GG A / T CC 
or GGATCC sites, and vice-versa (also tested in vivo, 
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Supplementary Figure S5D). However, in the same 
manner that the selection for promiscuous sites opened 
the door for novel methylation sites that were not 
selected for, the newly evolved variants also recognized 
other target sequences. In conjunction with the trend we 
observed with wild-type and the intermediate variants, 
promiscuous methylation of extended palindromic 
versions of the target sites selected for could be observed 
[e.g. GC( A / T )GC and GC( AT / TA )GC in Hhal-selected 
variants]. However, the methylation of non-palindromic 
'star' sites was mostly apparent. In fact, the 
Hhal-selected variants (GCGC) were found to methylate 
GCGN or NCGC sites, where N corresponds to any base. 
Nonetheless, sites in which any one of the two inner 
bases was modified (GTGC, GCTC), or both external 
bases were modified (e.g. TCGT), showed essentially no 
methylation. It was clear, therefore, that the methylation 
selectivity of these variants was primarily obtained 
through recognition of the three first bases, either on the 
'plus' or on the 'minus' strand. 

DISCUSSION 

Despite natural methyltransferases being highly divergent, 
and the ease of selection (active DNA methyltransferases 
protect their own encoding genes from restriction), the 
laboratory divergence of new methylation target sites 
has thus far proven challenging. Three major elements 
made it possible. The first one is boosting evolvability 
by the incorporation of stabilizing, compensatory muta- 
tions (23), by chaperonin buffering (24) and by neutral 
drift (26). Their roles in enabling the trajectories described 
here support the notion that the destabilizing effects of 
mutations comprise a major limiting factor in protein evo- 
lution (22,23,29-32). 

The second element regards the expansion-shrinkage 
mode by which new methylation target sites emerge. 
Earlier attempts made use of 'star' activities and only 
yielded mild changes in specificity (10), or methylation 
of non-palindromic sites (12,13). An alternative to the 
'star' mode is the expanded site mode. As indicated (20), 
M.EcoRV, a GA TATC methyltransferase, recognizes its 
substrate in a similar manner as EcoDam (GATC) 
whereby, DNA bending enables the accommodation of 
the expanded site. This proposed evolutionary mode (20) 
is strongly supported by our results: we found that 
wild-type M.Haelll promiscuously methylates expanded 
sites such as GG( A / T )CC, and could be readily evolved 
to methylate them with high efficiency and specificity. 
Via divergence to further expanded target sites, non- 
overlapping specificities such as GGATCC emerged. 
Finally, by shrinkage of the evolved GC ( G /c) GC target 
site, variants that methylate the novel target site GCGC 
(M.Hhal-like) emerged. Indeed, the natural methyltrans- 
ferases for our evolving target specificities (like M.Hhal 
for GCGC) are only remotely related to the starting point 
M.Haelll, and the sequence and structural homology is 
largely restricted to their catalytic domains. However, as 
demonstrated here, recognition of an expanded, or a 
shrunken target site, might have been the mechanism for 
changing DNA recognition specificity in ancestral 



methyltransferases. Further selection for specificity and 
drift, blurred nearly all remnants of common ancestry. 

The third divergence-driving element is 'generalist' 
intermediates. These comprised the missing link between 
the non-overlapping specificities of GGCC (M.Haelll) 
and GCGC (M.Hhal). Upon selection to improve latent, 
promiscuous activities, novel activities that were neither 
present in the starting point, nor selected for, emerged. 
What might be the molecular basis for emergence of gen- 
eralists? The early mutations occurred in residues that 
directly contact the GGCC target site (Figure 5). 
For instance, in wild-type M.Haelll, Arg225 interacts 
with the first guanine in M.HaellFs target site [GGCC, 
(9)], and mutations into Ser or Gly were the first to be 
fixed in the Narl trajectory (methylation of GGCGCC, 
and thereby might facilitated recognition of Narl partial 
sequence CGCC). In a similar manner, the first mutations 
in the Avail trajectory [methylation of GG( A / T )CC] 
occurred in Ser224 [to Gly and in Arg227 (to His) that 
interact with the internal bases of G GC C and facilitate the 
ability to methylate the expanded target site (9)]. Thus, as 
indicated by the broad activity patterns (Figure 1), selec- 
tion for methylation of new sites led to mutations in 
'gate-keeper' residues that loosened contacts with the 
original target site, rather than create new contacts with 
the evolving target sites. 

The relaxation in specificity enabled generalist inter- 
mediates to methylate not only the original and the newly 
evolved target sites but also other sites not selected for. 
Thus, a complete release of the burden of selection and 
transient non-functionalization, were not found to be a 
prerequisite for the acquisition of a new specificity, at 
least not in our experiment. Which mutations, then, 
mediated the switch in favor of the novel target sites and 
the loss of the original GGCC preference? (Figure 5) One 
example might be Arg243Gln that appeared at the early 
stages of the Hhal selection (G10). This mutation 
resulted in significantly enhanced methylation of the 
evolved GCGC site and a decline in methylation of the 
original GGCC site (Supplementary Figure S5E). 

Most mutations occurred, however, in residues not 
directly interacting with the target DNA (Figure 5C). 
Several of the remote mutations locate to regions that 
undergo large conformational changes upon DNA 
binding (e.g. Leu80Hys/Cys in the Hhal and in the 
BamHI selections, respectively) (33,34). These mutations 
may have shifted the conformational equilibrium in favor 
of the conformations that target the new sites (35-37). 
Other remote mutations act as stability compensators, or 
global suppressors that compensate for the loss of stability 
associated with the mutations that mediate new activities 
(31). A clear example is Asn262Tyr that appeared in all 
the evolved variants regardless of their specificity and 
also comprises the consensus in the M.Haelll family 
(Supplementary Table S2). Additionally, some of the con- 
sensus mutations that failed to stabilize wild-type 
M.Haelll (Supplementary Figure S2A) appeared in 
evolved variants with new specificities (e.g. Gly77Ala 
and Val283Leu in the BamHI and Hhal evolved 
variants, respectively; Supplementary Table SI). This in- 
dicates the particular utility of consensus mutations (and 
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Figure 5. The location of mutations that underlined the laboratory divergence of new and novel target specificities. Ribbon diagrams are based on 
the wild-type M.HaellFs crystal structure (Protein data bank (PDB) code: ldct). The cognate DNA substrate (GGCC) is denoted in lines, with the 
corresponding bases numbered (Gl, G2, C3, C4). M.Haelll residues in which mutations occurred are denoted as sticks. (A) First shell mutations 
fixed along the selections for Avail methylation and the subsequent BamHI (GGTACC) selection. (B) Like-wise for the Narl and Taul selections 
and, the subsequent Hhal trajectory. In all these trajectories, the mutations fixed in the first rounds were in residues directly contacting the exchanged 
DNA bases of the GGCC methylation site. (C) All type of positions at which mutations were fixed throughout the trajectory leading to the 
M.Hhal-like methyltransferase (shown are positions with mutation rates >25%). In red, first shell mutations in direct contact with the DNA 
target site. In green, remote mutations in regions previously implicated with conformational changes that relate to DNA binding and methylation 
(based on structural alignment with M.Hhal) (33,34). In blue, stabilizing mutations, including the initially introduced consensus mutations and 
stabilizing mutations that accumulated later in the trajectory. The mutations observed in the Avail and BamHI selection trajectories are illustrated in 
Supplementary Figure S5F. 



ancestral ones) in promoting the acquisition of new func- 
tions (23). 

Although our evolved variants methylate their target 
sites with catalytic efficiencies that match those of natural 
methyltransferases, they do not match the latter's specifi- 
city. The Hhal selected variants, H3 and H4, exhibited k cat / 
K M values of ~2.3 x 10 6 min _1 M _1 . However, they only 
show ~ 10-fold preference toward the evolved GCGC 
target versus the original GGCC target (in ratio of k cat / 
K M values; Table 1 and Supplementary Figure S3B). The 
'star' activities of these variants are also very high 
(Supplementary Figure S5C). The methylation of sites 
that relate neither to the original target sequence 



(GGCC) nor to any of the evolved ones, and were not 
tested in our assays, is not likely to occur (Supplementary 
Figure S5D), but cannot be completely ruled out at this 
stage. 

Specificity is a hallmark of the restriction-methylation 
system and the key to its biological function in protection 
against foreign DNA in general, and particularly in main- 
taining cross-species barriers (38). There are, however, 
few examples of natural methyltransferases with similar 
relaxed specificity (39). For example, the Cv/JI methyl- 
transferase is less specific toward the last base of its 
target sequence RGC(Y/G), than the endonuclease 
(RGCY) (40). Relaxed specificity could be typical to 
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evolutionary intermediates whereby the methylase 
diverges beyond the coverage of its cognate restriction 
enzyme. However, contrary to the current dogma of 
absolute specificity, broad specificity of the restriction 
enzyme may provide more efficient protection, as 
recently shown for phage defense by Kpnl nuclease (41). 
The specificity of action of the restriction-methylation 
system has implications beyond foreign DNA. Broad 
methylation specificity might also be deleterious, 
possibly due to methylated cytosines being prone to de- 
amination (42,43). Indeed, our most active variants 
became toxic once the SsrA tag that suppressed expression 
was removed (Supplementary Figure S4E). It may also be 
that, by itself, selection toward a new target site does not 
induce absolute specificity, and negative selection against 
alternative sites might be necessary. However, the selec- 
tion we applied against the original GGCC specificity 
resulted in reduced rates with both the newly evolved 
and the original sites (Supplementary Figure S4B). 
Indeed, trade-offs between fidelity and processivity are 
observed throughout, be it ribosomes (44) or DNA 
methyltransferases (previously reported M.Haelll variant 
with broadened 'star' activity exhibits > 10-fold higher 
kcat/^M values with the original GGCC site) (12). 

To conclude, our results suggest that 'generalist' inter- 
mediates — i.e. enzyme variants that exhibit activities 
toward novel target sites that were neither selected for, 
nor present in the starting point, comprise a missing link 
that bridges remotely related methyltransferases with 
non-overlapping specificities such as GGCC and GCGC 
(certainly with respect to their TRDs). Generalists may 
also comprise the progenitors of other highly diverged 
enzyme families (1,3,6). The acquisition and loss of pro- 
miscuous activities is therefore a dynamic process. Drift, 
i.e. accumulation of neutral mutations with respect to the 
native activity, can reduce or enhance existing promiscu- 
ous activities (45,46). However, as shown here, mutations 
that drive the acquisition of new functions also give rise 
to other activities, primarily by alleviating specificity 
constraints. The trajectories leading from one activity to 
another therefore open new side roads that may in turn 
lead to other new activities. 
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